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(54) Title: A FAST MULTIPLIERLESS TRANSFORM 
(57) Abstract 



A preferred embodiment 
of the invention provides an 
M x M multiplierless perfect 
reconstruction block transform 
(i.e., perfect reconstruction 
is realized if the invention 
is utilized for both analysis 
and synthesis, otherwise, 
if the invention is used on 
only one end, near-perfect 
reconstruction results) with 
symmetric/antisymmetric 
basis functions, i.e., linear 
phase filter. In a preferred 
embodiment, a cascade of 
±1 butterfly stages (12) is 
provided followed by two 
invertible matrices. Each 
invertible matrix is made up 
of a cascade of lifting steps 
(16) and scaling factors (oi). 
The scaling factors can be 
folded into the quantization 
stepsizes. A forward binDCT 
(e.g., for signal analysis), 
can be implemented using 
a cascade of ±1 butterflies 
(12) and dyadic lifting steps 
(16). Furthermore, an inverse 




binDCT (e.g., for signal synthesis can be easily realized using lifting steps (16) of reverse order and inverted polarity, cascaded with ±1 
butterflies (12). In accordance with an embodiment of the invention, all of the lifting coefficients have been chosen to be dyadic rational 
numbers. Each dyadic lifting step can be constructed by a simple combination of shift-and-add operations. 
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A FAST MULTIPLIERLESS TRANSFORM 



This application claims the benefit of U.S. Provisional Patent Application No. 
60/124,746, filed on March 17, 1999, which is incorporated herein by reference in 
its entirety. 

BACKGROUND OF THE INVENTION 

Mathematical transforms (e.g., the fourier transform, the discrete cosine 
transform, the wavelet transform, etc.) have found application in signal processing 
computer algorithms, particularly the compression and coding of speech, audio, 
image and video data. Such transforms are used in the currently popular image 
compression standard referred to as "JPEG" developed by the Joint Photographic 
Experts Group, as well as in high-performance video coding standards such as MPEG 

* 

and H.263. These operate by dividing up an image or video into sections or blocks, 
performing transforms on the data in the blocks, and compressing or coding the 
coefficients resulting from the transform. 

The JPEG standard, for example, operates by dividing an image into 8x8 
blocks of pixels (i.e., each block is an array of 64 pixels: 8 pixels in height and 8 pixels 
in width). A mathematical transform known as the "discrete cosine transform 
(DCT)" is calculated for each block in the image to produce a data set of coefficients. 



WO 00/55757 



PCT/US00/06941 



2 

A variable length code -based compression technique is then used to compress the 
data set and store the compressed data stream in an output file. The output file can 
then be decompressed to recover the DCT coefficients, the inverse DCT transform 
calculated, and the resulting blocks of pixels displayed to represent the original form 
of the image. 

Mathematically, the DCT transforms the blocks of pixels from the spatial 
representation to the frequency representation. In the frequency representation, the 
image or video information can be divided into frequency components, some of 
which are more important than others. The compression algorithm selectively 
quantizes or discards the frequency components that do not adversely affect the 
reconstructed picture contents. In this manner, compression is achieved. 

From a statistical signal processing standpoint, the DCT is a robust 
approximation to the optimal discrete-time Karhunen-Loeve transform (KLT) of a 
first-order Gauss-Markov process with a positive correlation coefficient p when p -> 
1 . The KLT is optimal in the sense that it provides the most accurate representation 
of a signal by producing transform coefficients that are the eigen values of the input's 
autocorrelation matrix, and as known in the art, the eigen values are the optimum 
achievable result. In other words, the KLT is optimal in the energy compaction 
sense, i.e., among unitary transforms, the KLT packs signal energy into the fewest 
number of coefficients. However, the KLT is signal-dependent, therefore, it is 
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computationally complex and expensive to implement. The DCT has proven to be a 
much better alternative in practice in that it is signal independent, has linear phase, 
has real coefficients, and can be implemented using fast algorithms. 

The DCT coefficients, X[0] through X[M-1], of an M x M transform (i.e., a 
5 transform having M 2 individual signal samples at its input to be processed row by row 
as M basis functions) and the input image pixels, x[0] through x[M-l], are related by 
the following equations: 



M-\ 



X[m\ = a™ ^ x[ri] cos 



n=0 



7nn 
2M 



(2n+l) 



, 0<m<M-l, 



with 



Obn ~ { 



V2/A/, 



\<m< M -\ 



10 



It has a closed-form expression and it is orthogonal. Orthogonality is desirable in a 
transform inasmuch as there is no redundancy among the M transform bases (e.g., 
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they are orthogonal to each other). In the matrix format, we can represent the 
transformation of an input, x, to a transform coefficient, X, as X = Px, and the 
reconstruction of the transform coefficient back into the input x as x = P% where P 
is the DCT transform matrix and P T is its inverse transform matrix. Exploiting the 
5 symmetry of the coefficients cos [(nm/2M) (2n + 1)] in the equation above, the 

i 

functions performed by the DCT transform matrix P can be represented by a series of 
±1 butterflies 12 and rotation angles 14, as illustrated in Fig. 1. The ±1 butterflies 
are a series of crossings which process one pair of inputs at a time (i.e., x[0] through 
x[7]) by adding and subtracting the inputs in pairs. For example, the first butterfly 

10 encountered by the data set x[0] through x[7] shown in Fig. 1 operates on the data 
pair x[0], x[7]. In the top branch of the first butterfly, the term x[0] is added to the 
term x[7], while in the bottom branch, the term x[7] is subtracted from x[0]. 
Similarly, in the second butterfly, the term x[l] is added to the term x[6], while in 
the bottom branch the term x[6] is subtracted from x[l]. In an 8 x 8 transform, the 

15 DCT initially performs four such butterflies (i.e., crossings) for a total of four discrete 
addition operations and four discrete subtraction operations. The DCT rotation 
angles are a series of multiplication operations carried out on the results achieved by 
the ±1 butterflies. As depicted in Fig. 1, "C 7i/4" signifies a multiplication operation 
(i.e., multiplying an input by cosine n/4). Similarly, "S tt/4" signifies a multiplication 

20 operation (i.e., multiplying an input by sine n/4). This process results in a fast DCT 
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implementation. Using this method, eight DCT coefficients X[0] - X[7] can be 
computed using 13 multiplication operations and 29 addition operations. 

Despite all of the above advantages of the DCT, there is still room for 
improvement. For example, the DCT is a floating-point transform, i.e., the cos and 
sin multipliers in the 5 rotation angles {ti/4, 37i/8, 7t/4, 7n/\6, 37t/16} are real 
numbers with infinite precision. Hence, the DCT cannot map integers to integers 
losslessly inasmuch as the product of the inputs, as processed by the ±1 butterflies, 
and the DCT rotation angles result in floating-point outputs that cannot be 
represented in binary form without truncation, thereby resulting in the loss of input 
data. More importantly, floating-point implementations are relatively slow, require 
too much hardware space, and consume too much power. 

To combat these problems, most implementations of the DCT are based on 
integer arithmetic by scaling up the floating-point multipliers by very large factors 
(e.g., multiplying by a factor of 1000 or greater to rid the calculation of floating 
decimal points). Unfortunately, however, this ad-hoc approximation method still has 
problems; for instance, it does not reduce the complexity of the circuit and it still has 
truncation errors. Moreover, if more computational accuracy is needed, the scaling 
factors must be very large, thereby adding more complexity to the implementation. 
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SUMMARY OF THE INVENTION 



A preferred embodiment of the invention provides an M x M multiplierless 
perfect reconstruction block transform (i.e., perfect reconstruction is realized if the 
invention is utilized for both analysis and synthesis, otherwise, if the invention is used 
on only one end, near-perfect reconstruction results) with symmetric/antisymmetric 

4 

basis functions, i.e., linear phase filters. In a preferred embodiment, a cascade of ±1 
butterfly stages is provided followed by two invertible matrices. Each invertible 
matrix is made up of a cascade of lifting steps and scaling factors (a t ). The scaling 
factors can be folded into the quantization stepsizes. A forward binDCT (e.g., for 
signal analysis), can be implemented using a cascade of ±1 butterflies and dyadic 
lifting steps. Furthermore, an inverse binDCT (e.g., for signal synthesis) can be easily 
realized using lifting steps of reverse order and inverted polarity, cascaded with ±1 
butterflies. 

In accordance with an embodiment of the invention, all of the lifting 
coefficients have been chosen to be dyadic rational numbers. Each dyadic lifting step 
can be constructed by a simple combination of shift-and-add operations. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

These and other advantages, features, and applications of the invention will be 
apparent from the following detailed description of the invention which is provided 
in connection with the accompanying drawings in which: 

Fig. 1 is a signal flow graph which depicts a conventional fast implementation 
of an 8 x 8 floating-point DCT; 

Figs. 2(a)-2(c) are signal flow graphs which depict variations on lifting steps 
used in the analysis and synthesis of data samples; 

Fig. 3 is a signal flow graph which depicts a general solution in accordance 
with the invention for a perfect reconstruction block transform whose basis functions 
have linear phase; 

Fig. 4 is a signal flow graph which depicts how an invertible matrix U can be 
decomposed into a cascade of lifting steps and scaling factors (<X;) in accordance with 
a preferred embodiment of the invention; 

Figs. 5-11 depict a forward binDCT in accordance with preferred 
embodiments of the invention; 
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Fig. 12 depicts frequency responses of a binDCT in accordance with a 
preferred embodiment of the invention; 

Fig. 13 depicts a transform and inverse transform matrix of the first 
embodiment of a binDCT; 

5 Fig. 14 depicts peak signal-to-noise ratios (PSNR) realized for two 

embodiments of the binDCT; and 

Fig. 1 5 depicts a simplified block diagram of a transmission system 
incorporating analysis and synthesis filters which employ a binDCT block transform 
in accordance with preferred embodiments of the invention. 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

Preferred embodiments and applications of the invention will now be 
described with reference to Figs. 2-15. Other embodiments may be realized and 
structural or logical changes may be made to the disclosed embodiments without 
departing from the spirit or scope of the invention. Although the invention is 
particularly described as applied to the analysis and synthesis of image and video data 
samples, it should be readily apparent that the invention may be applied to any other 
data samples or sets having the same or similar problems. 

Instead of factoring the DCT into a cascade of rotation angles, as in 
conventional practice in the art, the invention utilizes a series of lifting steps (also 
known as "shear" operations or "ladder" structures). The basic principle behind a 
lifting step is demonstrated in the signal flow graphs depicted in Figs. 2(a)-2(c). In 
accordance with a preferred embodiment of the invention, the denominator of the 
lifting coefficient /, is deliberately chosen to be dyadic in that it is a power of 2 (i.e., a 
rational number that can be written in the form of l/2 m , k } m e Z) for easy binary 
implementation. Fig! 2(b) shows that the lifting step can achieve perfect 
reconstruction of a signal even with the presence of a nonlinear truncation operator 
such as a "floor" (which rounds down to the nearest integer), a "round" (which 
rounds either up or down to the nearest integer), or a "ceiling" (which rounds up to 
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the nearest integer). Fig. 2(c) illustrates how this property is exploited to implement 
the lifting step very efficiently using a combination of addition operations and bit- 
shift operations. The method is always feasible as long as the lifting coefficient is 
dyadic. For example, a lifting coefficient of 3/8 signifies "multiplying" an input 
5 value by 3 followed by a "division" by 8 implemented by a right-shift of 3 binary 
places (i.e., 2 3 = 8 in the denominator). It should be noted that the term 
"multiplying" is actually inaccurate since a series of bit-shift and addition operations 
are performed upon the signal to achieve the same result without performing the 
actual multiplication operation. 

10 As known in the art, binary multiplication may be simplified through a series 

of bit-shift and addition operations. Generally, in order to implement a 
multiplication operation, a multiplicand (i.e., a number that is to be multiplied by 
another) is multiplied by a multiplier (i.e., a number by which a multiplicand is to be 
multiplied). When multiplying binary numbers, bit-shift and addition operations 

15 may be implemented to effectively perform a multiplication without the same level of 
complexity required for multiplication. 

A general description of binary bit-shift and addition operations, as substitutes 
for multiplication is found in U.S. Patent No. 3,691,359 to Dell et ai., which is 
incorporated herein by reference. 
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For example, the numerator 3 of the lifting coefficient 3/8 signifies 
performing the following steps: a) a left-shift of 1 binary place upon the input signal 
to the lifting step; and b) after a left-shift of one place is performed (e.g., the decimal 
value 5 is represented as 101 in binary; after a left-shift, 101 becomes 1010), the 
5 result is then added to the original value of the input signal (e.g., 101), resulting in a 
lifting step output of 1111 binary (or decimal value 15). A multiplication by 3 has 
just been performed with just one bit-shift and one addition operation. In 
accordance with a preferred embodiment of the invention, each of the DCT's 
rotation angles (e.g., 14 of Fig. 1) are replaced with a minimal number of dyadic 
10 lifting steps, yet the resulting transform still retains most of the DCT's superior 
performance. 

The DCT and the novel transformation referenced herein as "binDCT" are 
both MxM block transforms. For purposes of illustration only, an 8 x 8 block will 
be described herein. At the forward transform in the encoder, 8 input samples x[0] - 

15 x[7] are processed at a time, yielding 8 transform coefficients X[0] - X[7] that can 
then be quantized and encoded (e.g., entropy encoded) in a manner known in the 
art. In the decoder, the 8 encoded transform coefficients are entropy decoded, 
inverse quantized, and then fed to the inverse transform module. For ID signals 
such as speech and audio, the input stream can be divided into 8-sample segments, 

20 each is then processed independently. For 2D and 3D signals (images and videos 

respectively), the input can be segmented into non-overlapped 8x8 blocks or 8 x 8 x 
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8 cubes. Each block or cube can be processed independently in separable fashion. 
For example, the ID transformation can be applied to every row, then to every 
column, to obtain a 2D transformation. 

As depicted in Figs. 5-8, each of which depicts a preferred embodiment of the 
invention, the 8 input samples labeled from a{0] to x{7] are first processed in four 
pairs of two by the ±1 butterflies 12. Each of the four ±1 butterflies 12 takes a pair of 
data samples and computes their sum as well as their difference. 

As depicted in Fig. 3, each of the resulting sums may be arranged in the 4 
upper branches Ul, U2, U3, U4, which will be processed by invertible matrix U to 
produce the 4 even-indexed transform coefficients X[0], X[2], X[4], X[6] associated 
with the transform's 4 symmetric basis functions. Likewise, all of the resulting 
differences are fed into the 4 lower branches VI, V2, V3, V4, that will be further 
transformed by invertible matrix V to produce the 4 odd-indexed transform 
coefficients X[l], X[3], X[5], X[7] associated with the transform's 4 antisymmetric 
basis functions. 

In accordance with an embodiment of the invention, the outputs of the 
butterflies are processed significandy different by the binDCT, as compared with the 
DCT. For instance, whereas the DCT relies on rotation angles (e.g., 14 of Fig. 1) 
which are depicted below in matrix form as 
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cos$ -sin 3 
sin$ cos# 



or 



cos 6i sin Oi 
sin 6i - cos 0^ 



the binDCT, in accordance with a preferred embodiment of the invention, relies on 
lifting steps, or cascades of shears (e.g., 16 of Figs. 5-8), a general matrix 
representation of which (to be described more fully below) is depicted below as 



1 /, 
0 1 



1 0 
1 



Since a rotation angle is orthogonal, its inverse is simply its transpose, e.g., 



cos # -sin$ 
sin $ cos # 



cos 0 X - sin Oi 
sin 0 { cos Oi 



cos Oi sin Oi 
-sin$ cos # 



cos $ -sin$ 
sin $ cos$ 



1 0 
0 1 



Inverting a lifting step is achieved simply by inverting its polarity, i.e., 







"1 h 




"1 0" 




1 0" 




' 1 0" 




"1 0" 












, or 






.'i+l l . 




0 I 


0 1 




0 1 




0 1 
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This is always true as long as the diagonal elements of the matrix are unity with the 
same polarity (both +1 or -1). In the case that the diagonal elements have opposite 
polarity, the inverse of the lifting step is simply itself, i.e., 



±1 /,' 


"+1 


h 




"l 0" 


0 +1 

J 


0 






0 1_ 



5 

Therefore, transforms constructed from series of lifting steps are still perfecdy 
invertible. Since the multiplication operations associated with the DCT rotation 
angles require floating-point arithmetic, and since it is very difficult to choose 
rotation angles that yield integer multiplications, in accordance with a preferred 

10 embodiment of the invention, the orthogonality associated with the DCT may be 
sacrificed in the binDCT in favor of lower complexity. Specifically, lifting steps are 
used to replace the DCTs rotation angles, preferably having the denominator of the 
lifting coefficients /, chosen to be dyadic, both the forward and the inverse transform 
can be implemented without using any multiplication (or division) arithmetic 

15 operations, particularly floating point multiplication. Fig. 9 depicts an embodiment 
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of an inverse binDCT which employs inverted lifting steps having dyadic lifting 
coefficients. 



In accordance with another aspect of the invention, specific choices of lifting 
coefficients /, were made to result in high-performance transforms. 

In each of the 4 upper branches of the binDCTs depicted within Figs. 5-8, the 
DCT's 2 butterflies (12a as in Fig. 1) are retained; however, the two upper branch 
DCT rotation angles of rc/4 and 37t/8 can be replaced with one of the following sets 
of lifting steps depicted below in matrix form. 



1 0" 


"1 f 




"l 


-1/2" 


"1 0" 




1 1 














, or 


1 -1 


1/2 -1_ 


0 1 


> 


.0 


1 


1 -1_ 





For example, the upper branch DCT rotation angle of n/4 (14a as seen in 
Fig. 1) is replaced with a pair of lifting steps 16a (of Figs. 5-8). For illustrative 
purposes, consider an input A on the upper branch of the first lifting step of 16a and 
an input of B on the lower branch of the first lifting step of 16a. Consequently, the 
output on the upper branch of the first lifting step is A+B and the output on the 
lower branch of the first lifting step is B. Similarly, the input on the lower branch of 
the second lifting step is B and the input for the upper branch of the second lifting 
step is A+B. Consequently, the output for the upper branch of the second lifting 
step is A+B and the output for the lower branch of the second lifting step is Vi(A+B)- 
B. In accordance with a preferred embodiment of the invention, these lifting steps 
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can be implemented in software and/or hardware using a series of binary shifts and 
adds on the input data to produce the appropriate transform coefficients (e.g., 
X[0]...X[7]) representative of the original image data samples. 

While any one of the above 3 matrix sets (or other equivalent sets apparent to 
5 those of ordinary skill in the art) may be employed to obtain these output results 
(albeit, with differing levels of accuracy), the first matrix set set forth above will be 
derived, by way of example, as a replacement for the upper branch tt/4 rotation angle 

in Fig. 1 . 

One way in which respective outputs of A+B and B for the upper and lower 
10 branches of the first lifting step may be obtained is to multiply the input vector 

"A" 
B 

_ 

by a matrix which represents the first hfting step of 16a: 

"1 f 
0 1 ' 
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Performing matrix multiplication results in the respective upper and lower branch 
outputs of A+B and B in the first lifting step. 

One way in which respective outputs of A+B and Vi(A+B)-B for the upper and 
lower branches of the second lifting step of 16a may be obtained is to multiply the 
vector input to the second lifting step 

'A + B' 
B 



by a matrix which represents the second lifting step of 16a: 

1 0 

1/2 -1 ' 



Using a method similar to that described above for the upper branch DCT 
rotation angle of 7i/4 in the graph of Fig. 1, the upper branch DCT rotation angle of 
in/8 (14b in Fig. 1) may be replaced with the following sets of lifting steps (or 
equivalent lifting steps apparent to those skilled in the art) depicted below in matrix 
form: 
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' 1 0" 




"1 


-3/8" 


, or 


1 0" 




"i r 




1 0" 


3/8 1 




0 


1 


-5/8 1 




0 1 




-5/8 1 



In each of the 4 lower branches of the binDCTs, depicted within Figs. 5-8, 
which yield the odd-indexed transform coefficients (i.e., X[l], X[3], X[5], X[7]), 

* 

three DCT rotation angles (i.e., tc/4 (14d), 7n/\6 (14e), and 3tt/16 (14f, as shown 
in Fig. 1) are also replaced with lifting steps, particularly those lifting steps having 
dyadic denominators in their lifting coefficients. It should be noted that special care 
must be taken in choosing a lifting coefficient to replace rotation angle 7r/4 (14d) in 
the lower branches since it precedes the remaining angles in the structure. The 
following choices of lifting steps (depicted in matrix form) are preferable to yield 
better coding performances although other equivalent lifting steps may be apparent 
to those skilled in the art: 



7t 


-1 5/8 


1 


0 




-1 3/8 


1 0 


1 -7/16 




"-1 7/16" 


" 1 


0" 


"1 


-7/16" 


• 
* 

4 


0 1 _ 


3/8 


1 




0 1 




11/16 1 


0 1 






0 


1 


3/4 


1 


0 


1 




-1 1/2" 


" 1 


0" 


"l 


-1/2" 




-1 3/8" 


" 1 0" 


"l 




-3/8" 




"i r 






























, or 


i -i 


• 








0 1 


3/4 


1 


0 1 
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0 1 


3/4 1 


0 




1 
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Next, the DCT rotation angle 7rc/16 (14e shown in Fig. 1) may be replaced with 
one of the following sets of lifting steps (or equivalent steps) depicted below in 
matrix form: 



16 



1 -1/8 
0 1 



1 0 
3/16 1 



1 -3/16 
0 1 



, or 



1 0 
-13/16 1 



1 1 
0 1 



-J u 



1 0 
-13/16 1 



Finally, the DCT rotation angle 3ti/16 (14f shown in Fig. 1) may be replaced with 
one of the following sets of lifting steps (or equivalent steps) depicted below in 
matrix form: 



1 0" 


"1 


7/8 




1 


0~ 


"l 


1/2" 


1 


0" 




1 


0" 


"l 


1/2" 


1 0" 






















, or 










-1/4 1 

_ _ 


-1/2 1 


0 


1 _ 


5 


-1/8 


1 


0 

_ 


1 

- 


-3/4 


1 




-1/4 


1 


0 


1 _ 



Note that each embodiment of the binDCT still employs the following ±1 butterfly 
(of Fig. 1) which may be represented in matrix form as: 

fl ll 
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and whose inverse is itself scaled down by a factor of two. To recover x[i] exactly on 
the synthesis end, the outputs of the inverse binDCTs have to be attenuated by Vi, 
which is preferably implemented in software and/or hardware as a right-shift of the 
5 data two binary places. It should be noted that in lossless coding, the ±1 butterfly is 
not very efficient since it amplifies the signal by 2. To compensate for this effect, in 
accordance with another aspect of the invention one or more ±1 butterflies (e.g., 12 
in Figs. 5-8) may be replaced with either of the following sets of lifting steps (or 
equivalent steps) as depicted below in matrix form: 



1 -1/2 
0 1 



1 0 
1 -1 



> or 



1 0 
1/2 -I 



1 1 
0 1 



10 



As an example, Figs. 10 and 11 are respectively provided to illustrate lossless 
versions of the embodiment of the binDCT shown in Fig. 5 and its inverse binDCT 
in which the ±1 butterflies (e.g., 12 of Fig. 5-8) are replaced with lifting steps having 
dyadic denominators in their lifting coefficients. Similar modifications may be made 
15 to each embodiment (e.g., as respectively depicted in Figs. 5-8) without departing 
from the spirit or scope of the invention. 



L 
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Fig. 12 depicts a graphical representation of frequency response for a forward 
and inverse binDCT, respectively, in accordance with a preferred embodiment of the 
invention. 

Fig. 13 illustrates a table of coefficients for a transform matrix which may be 
used to produce the same or substantially the same coefficients produced by binDCT, 
in accordance with embodiments of the invention (e.g., as in Fig. 5). The top 
portion of the chart 130 depicts transform coefficients used for signal analysis. The 
bottom portion of the chart 132 depicts transform coefficients used for signal 
synthesis. Each portion of the table 130, 132 is obtained by first converting each 
±1 butterfly and each lifting step within binDCT, e.g., the preferred embodiment of 
the invention shown in Fig. 5, into matrix form. Next, multiplying each of the 
matrices within a basis function (e.g., x[0]) to respectively produce results in the 8 
rows representative of the 8 output coefficients produced using binDCT. That is, the 
Fig. 13 table provides a matrix of transform coefficients, ho through h 7 , which are M 
basis functions of an M x M transform consistent with a preferred embodiment of the 
invention. 

Fig. 14 depicts a table showing empirical data of peak signal-to-noise ratios 
(PSNR) between reconstructed and original images. The table depicts results for 
three separate test images represented in 512 x 512 8-bit grayscale; "Lena," 
"Goldhill" and "Barbara." The table depicts the PSNR for the DCT in the left 
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column of each test image and offers a comparison with two preferred embodiments 
of the invention (of Figs. 5 and 7, respectively). As can be seen in Fig. 14, die 
performance of the binDCT is very similar to that of the DCT. 

Referring now to Fig. 15, a transmission system is depicted in which each 
embodiment of the binDCT may be employed within a respective analysis block 150, 
152, and each inverted binDCT may be employed within a respective synthesis block 
154, 156. At the analysis side, an input signal Y(z) is processed by a pair of analysis 
block 150, 152, each employing an embodiment of the binDCT block transform in 
accordance with the invention. The resulting sub-band signals are then decimated by 
a factor of two. At the synthesis side, the signals Y 0 (z), Y x (z) are interpolated, and 
transformed by analysis blocks 154, 156, each of which employs an embodiment of 
the inverse binDCT block transform in accordance with the invention. The 
synthesized signals are then added to produce the output signal Y(z)'. As should be 
readily apparent, the application of the invention is not limited to a transmission 
system and may be employed for other applications including, but not limited to, 
storing the coded data and reproduction of the data using any given medium. 
Furthermore, although an 8 x 8 transform has been depicted, the implementation of 
the invention is not limited to 8 x 8 transforms and, accordingly, may be used to 
process matrices of any dimension without departing from the spirit or scope of the 
invention. 
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The invention provides a simplified block transform that can be implemented 
using only bit-shift and add operations. The transform replaces DCT rotation angles 
with a series of lifting steps preferably having dyadic denominators in their lifting 
coefficients specifically chosen to allow the binDCT to operate with substantially the 
5 same degree of accuracy as the DCT, but without the DCT's level of complexity. 
The result is a simpler, faster, transform process which retains all of the practical 
benefits of the DCT. 

The novel block transform of the invention, "binDCT," offers at least the 
following advantages: 

10 • The binDCT has a fast, elegant implementation utilizing only shift-and- 

add operations. No multiplication is needed. For example, eight 
transform coefficients can be computed using only 14 bit shift operations 
and 31 addition operations (13 floating point-multiplication operations 
and 29 addition operations are required for the DCT); 



15 



• The binDCT can map integers to integers with exact reconstruction. It 
also allows a unifying lossy/lossless coding framework; 
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• In software implementation, the binDCT is at least 3-4 times faster than 
the floating-point DCT. In hardware implementations, the binDCT may 
be approximately 10 times or more faster than the floating-point DCT; 

• The multiplierless property of the binDCT allows efficient very large scale 
integrated circuit (VLSI) implementations in terms of both chip area and 
power consumption; 

• The binDCT approximates the DCT very closely. Perceptual quantization 
matrices and coding strategies designed specifically for the DCT can be 
applied to the binDCT immediately without any complicated 
modification; and 

• The cascade of a well-tuned forward binDCT followed by an inverse DCT 
(or a forward DCT followed by an inverse binDCT) produces a reasonable 
near-perfect-reconstruction system. 

The binDCT provides very high coding performance. For instance, the 
coding gain (a popular measure of a transform's energy compaction property) of 
several binDCT versions may range in certain examples from 8.77 to 8.82 dB, 
whereas the DCT has a coding gain of 8.83 dB. In image coding experiments, the 
binDCT may be approximately 0.1 - 1.0 dB below the DCT in die signal-to-noise 
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ratios of the reconstructed images in certain examples. The binDCT may be found 
to offer higher visual reconstructed quality than the DCT. 

While preferred embodiments of the invention have been described and 
illustrated, it should be apparent that many modifications can be made to the 
invention and the invention's application without departing from its spirit or scope. 
Accordingly, the invention is not limited by the foregoing description or drawings, 
but is only limited by the scope of the appended claims. 
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What is claimed is: 

1 . A method of processing digital samples, the method comprising: 

processing the digital samples with a plurality of lifting steps, said act of 
processing being multiplierless. 

5 2. The method as in claim 1 further comprising: 

receiving at an input of a transform a plurality of sample segments of discrete 
time signals; and 

generating a plurality of transform coefficients. 

3. The method as in claim 2 further comprising processing the plurality of 
10 sample segments in pairs of two, said act of processing generating a plurality of sums 

and differences of said sample segments. 

4. The method as in claim 1, wherein a lifting coefficient of the lifting 
steps is dyadic. 

5. The method as in claim 3, wherein said act of processing is achieved 
15 with at least one ±1 butterfly. 
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6. The method as in claim 5, wherein said act of processing is achieved 
with a lifting step which, in matrix form, can be expressed as either: 



"l -1/2" 




"1 o" 




" 1 o" 




"l f 








, or 


1/2 -1_ 




0 1 

_ -1 


0 1 




1 -1 







7. The method as in claim 1 further comprising: 

receiving at an input of a transform a plurality of generated transform 
coefficients; and 

generating a plurality of sample segments of discrete time signals. 

8. The method as in claim 7 further comprising processing a plurality of 
sums and differences generated from said act of processing the digital samples with a 
plurality of ±1 butterflies. 

9. The method as in claim 8, wherein said act of processing a plurality of 
sums and differences is achieved with a plurality of lifting steps having dyadic lifting 
coefficients. 
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10. The method as in claim 2, wherein said act of receiving further 
comprises receiving at an input of an M x M transform M sample segments of 
discrete time signals. 

1 1 . The method as in claim 10, wherein said act of generating further 
comprises generating M transform coefficients. 

12. The method as in claim 7, wherein said act of receiving further 
comprises receiving at an input of an inverse M x M transform M generated 
transform coefficients. 

13. The method as in claim 12, wherein said act of generating further 
comprises generating M sample segments of discrete time signals. 

14. The method as in claim 4, wherein one of the following sets of dyadic 
lifting coefficients, as represented in matrix form, is employed: 



"l 


-1/2" 




"1 0" 




"1 0" 




"1 f 




"1 f 


















, or 


1 -1_ 


0 


1 




1 -1_ 




1/2 -1_ 




0 1 





WO 00/55757 



PCT/US00/06941 



29 

15. The method as in claim 4, wherein one of the following sets of dyadic 
lifting coefficients, as represented in matrix form, is employed: 



1 0 

3/8 1 



1 -3/8 
0 1 



,or 



1 0 
-5/8 1 



1 1 
0 1 



1 0 
-5/8 1 



16. The method as in claim 4, wherein one of the following sets of dyadic 
lifting coefficients, as represented in matrix form, is employed: 



-1 5/8 
0 1 

-1 1/2 
0 1 



1 0 
3/8 l u 

1 0 
3/4 1 



-1 3/8 

0 1 

1 -1/2 

0 1 



1 0 

11/16 1 

-1 3/8 
0 1 



1 -7/16 
0 1 

3/4 1 



1 



-1 7/16 
0 1 

1 -3/8 
0 1 



I 

3/4 
1 



or 



1 



0 
1 

1 

-1 



1 

0 



-7/16 
1 



17. The method as in claim 4, wherein one of the following sets of dyadic 
lifting coefficients, as represented in matrix form, is employed: 

1 0" 
-13/16 1 ' 



"l -1/8" 




"l 0" 




1 


-3/16" 


, or 


0 1 




3/16 1 

— — 




0 


1 





1 Oil 
-13/16 1 0 1 
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18. The method as in claim 4, wherein one of the following sets of dyadic 



lifting coefficients, as represented in matrix form, is employed: 



1 0 
-1/2 1 



J L 



1 7/8 
0 1 



1 0 
-1/8 1 



1 1/2 
0 1 



1 0 
-3/4 1 



, or 



1 0 
-1/4 1 



1 1/2 
0 1 



1 0 
-1/4 1 



19. An analysis filter, the filter comprising: 



a receiver for receiving at an input a plurality of sample segments of discrete 



time signals; and 



means for processing the plurality of sample segments of discrete time signals 
with a plurality of lifting steps, the processing being multiplierless, each of the 
plurality of lifting steps having a dyadic lifting coefficient, and the means for 



processing also being capable of generating a plurality of transform coefficients. 
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20. A synthesis filter, the filter comprising: 

a receiver for receiving at an input a plurality of generated transform 
coefficients; and 

means for processing the plurality of generated transform coefficients through 
a plurality of lifting steps, the processing being multiplierless, each of the plurality of 
lifting steps having a dyadic lifting coefficient, and said means for processing also 
being capable of generating a plurality of sample segments of discrete time signals. 

21. A computer readable storage medium containing a computer readable 
code for operating a computer to process digital samples, said code causing said 
computer to: 

process the digital samples with a plurality of lifting steps, said act of 
processing being multiplierless. 
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22. The computer readable storage medium of claim 21, wherein said code 
further causes said computer to: 

receive at an input of a transform a plurality of sample segments of discrete 
time signals; and 

generate a plurality of transform coefficients. 

23. The computer readable storage medium method of claim 22, wherein 
said code further causes said computer to: 

process the plurality of sample segments in pairs of two, thereby generating a 
plurality of sums and differences of said sample segments. 

24. The computer readable storage medium of claim 21, wherein a lifting 
coefficient of the lifting steps is dyadic. 

25. The computer readable storage medium of claim 23, wherein said code 
causes said act of processing to be achieved with at least one ±1 butterfly. 
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26. The computer readable storage medium of claim 25, wherein said act 
of processing is achieved with a lifting step which, in matrix form, can be expressed as 
either: 



"1 -1/2" 




"1 0" 


, or 


" 1 0" 




"i r 


0 1 




1 -1_ 




1/2 -1_ 




0 1 

_ _ 



27. The computer readable storage medium of claim 21, wherein said code 
further causes said computer to: 

receive at an input of a transform a plurality of generated transform 
coefficients; and 

generate a plurality of sample segments of discrete time signals. 

28. The computer readable storage medium of claim 27, wherein said code 
further causes said computer to process a plurality of sums and differences generated 
from said act of processing the digital samples with a plurality of ±1 butterflies. 
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29. The computer readable storage medium of claim 28, wherein said code 
causes said act of processing a plurality of sums and differences to be achieved with a 
plurality of lifting steps having dyadic lifting coefficients. 

30. The computer readable storage medium of claim 22, wherein said code 
5 further causes said computer to receive at an input of an M x M transform M sample 

segments of discrete time signals. 

31. The computer readable storage medium of claim 30, wherein said code 
further causes said computer to generate M transform coefficients. 

32. The computer readable storage medium of claim 27, wherein said code 
10 further causes said computer to receive at an input of an inverse M x M transform M 

generated transform coefficients. 



33. The computer readable storage medium of claim 32, wherein said code 
further causes said computer to generate M sample segments of discrete time signals. 



WO 00/55757 



PCT/US00/06941 



35 

34. The computer readable storage medium of claim 24, wherein said code causes 
one of the following lifting coefficients, as represented in matrix form, to be 
employed: 



"1 -1/2" 


"l 


0" 




"1 0" 




"1 f 


, or 


"1 1" 


0 1 

- -J 


1 


-1_ 


5 


1/2 -1_ 




0 1 




1 -1 



5 35. The computer readable storage medium of claim 24, wherein said code 

causes one of the following lifting coefficients, as represented in matrix form, to be 
employed: 



"1 0" 




"l 


-3/8" 




"1 0" 




"1 f 




"1 0" 










,or 






0 1 




-5/8 1 


3/8 1_ 




0 


1_ 




-5/8 1_ 







36. The computer readable storage medium of claim 24, wherein said code 
10 causes one of the following lifting coefficients, as represented in matrix form, to be 
employed: 



"-1 5/8" 




"1 0" 




"-1 3/8" 




"l 0" 




"1 


-7/16" 




"-1 7/16" 




"1 0" 




"1 -7/16" 


0 1 




3/8 1 




0 1 




11/16 1 




0 


1 


3 


0 1_ 




3/4 1 




0 1_ 



-1 1/2" 




"l 0" 




"l 


-1/2" 




"-1 3/8" 




"1 0" 




"l -3/8" 




"l r 






















, or 


1 -1 


0 1 




3/4 1_ 
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1 
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0 1 




3/4 1_ 




0 1. 
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37. The computer readable storage medium of claim 24, wherein said code 
causes one of the following lifting coefficients, as represented in matrix form, to be 
employed: 



"1 -1/8" 




"1 


— 

0 




"l -3/16" 




"l 0" 




"1 1" 




"1 0" 














, or 


-13/16 1 




0 1_ 




-13/16 1 


0 1 

_ _i 


> 


3/16 


1 




0 1 









38. The computer readable storage medium of claim 24, wherein said code 
causes one of the following lifting coefficients, as represented in matrix form, to be 
employed: 



1 0 
-1/2 1 



1 7/8 
0 1 



1 0 
-1/8 1 



1 1/2 
0 1 



1 0 
-3/4 1 



, or 



1 0 
-1/4 1 



1 1/2 
0 1 



1 0 
-1/4 1 
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